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Abstract 

The left-corner transform removes left-recursion 
from (probabilistic) context-free grammars and uni- 
fication grammars, permitting simple top-down 
parsing techniques to be used. Unfortunately the 
grammars produced by the standard left-corner 
tranoform arc uoually much larger than the original. 



Left-corner transforms are particularly useful be- 
cause they can preserve annotations on productions 
(more on this below) and are therefore applicable to 
more complex grammar formalisms as well as CFGs; 
a property which other approaches to left-recursion 
elimination typically lack. For example, they apply 
to left-recursive unification-based grammars (|Mat- ' 



Tho uolouthro loft 



tranuform doooribod in thio 



paper produces a transformed grammar which eimu 



lates left-corner recognition of a user-specified set of 
the original productions, and top-down recognition 
of the others. Combined with two factorizations, it 
produces non-left-recursive grammars that are not 
much larger than tho original. 



1 Introduction 

Top-down parsing techniques are attractive because 
of their simplicity, and can often achieve good per- 
formance in practice (Roark and Johnson, 1999). 
However, with a left-recursive grammar such parsers 
typically fail to terminate. The left-corner gram- 
mar transform converts a left-recursive grammar 
into a non-left-recursive one: a top-down parser 
using a left-corner transformed grammar simulates 
a left-corner parser using the original grammar 



(Rosenkrantz and Lewis II, 1970; Aho and Ullman 



197-1 ) . However, the left-corner transformed gram- 
mar can be significantly larger than the original 
grammar, causing numerous problems. For exam- 
ple, we show below that a probabilistic context-free 
grammar (PCFG) estimated from left-corner trans- 
formed Penn WSJ tree-bank trees exhibits consid- 
erably greater sparse data problems than a PCFG 
estimated in the usual manner, simply because the 
left-corner transformed grammar contains approxi- 
mately 20 times more productions. The transform 
described in this paper produces a grammar approx- 
imately the same size as the input grammar, which 
is not as adversely affected by sparse data. 



* This research was supported by NSF awards 9720368, 
9870676 and 9812169. We would like to thank our colleagues 
in BLLIP (Brown Laboratory for Linguistic Information Pro- 
cessing) and Bob Moore for their helpful comments on this 
paper. 



sumoto ct al., 1983 ; Pereira and Shieber, 1987; John- 
, 1998aj) . Because the emission probability of a 



son 



PCFG production can be regarded as an annotation 
on a CFG production, the left-corner transform can 
produce a CFG with weighted productions which 
assigns the same probabilities to strin gs and trans- 



form ed trees as the original grammar ( Abney et al 
1999| ). However, the transformed grammars can be 



much larger than the original, which is unacceptable 
for many applications involving large grammars. 

The selective left-corner transform reduces the 
transformed grammar size because only those pro- 
ductions which appear in a left-recursive cycle need 
be recognized left-corner in order to remove left- 
recursion. A top-down parser using a grammar pro- 
duced by the selective left-corner tr ansform simu- 
lates a genera lized left- corner parser (Demers, 1977; 
Nijholt, 198C ) which recognizes a user-specified sub- 
set of the original productions in a left-corner fash- 
ion, and the other productions top-down. 

Although we do not investigate it in this paper, 
the selective left-corner transform should usually 
have a smaller search space relative to the standard 
left-corner transform, all else being equal. The par- 
tial parses produced during a top-down parse consist 
of a single connected tree fragment, while the par- 
tial parses produced produced during a left-corner 
parse generally consist of several disconnected tree 
fragments. Since these fragments are only weakly re- 
lated (via the "link" constraint described below), the 
search for each fragment is relatively independent. 
This may be responsible for the observation that 
exhaustive left-co rner parsing is le ss efficient than 
top-down parsing (Covington, 1994). Informally, be- 
cause the selective left-corner transform recognizes 
only a subset of productions in a left-corner fashion, 
its partial parses contain fewer tree discontiguous 



fragments and the search may be more efficient. 

While this paper focuses on reducing grammar 
size to minimize sparse data problems in PCFG 
estimation, the modified left-corner transforms de- 
scribed here are generally applicable wherever the 
original left-corner transform is. For example, the 
selective left-corner transform can be used in place 
of the standard left-corner transform in the con- 



struction of finite-state approximations (Johnson 



1998a), often reducing the size of the intermedi- 
ate automata constructed. The selective left-corner 
t ransform can be g eneralized to head-corner parsing 
(van Noord, 1997), yielding a selective head-corner 
parser. (This follows from generalizing the selective 
left-corner transform to Horn clauses). 

After this pape r was accepted for publication we 
learnt of Moore ( 2000| ), which addresses the issue 
of grammar size using very similar techniques to 
those proposed here. The goals of the two papers 
are slightly different: Moore's approach is designed 
to reduce the total grammar size (i.e., the sum of 
the lengths of the productions), while our approach 
minimizes the number of productions. Moore (200C) 
does not address left-corner tree-transforms, or ques- 
tions of sparse data and parsing accuracy that are 
covered in section [| 



2 The selective left-corner and 
related transforms 

This section introduces the selective left-corner 
transform and two additional factorization trans- 
forms which apply to its output. These transforms 
are used in the experime nts de scribed in the follow- 
ing section. As Moore (2000) observes, in general 
the transforms produce a non-left-recursive output 
grammar only if the input grammar G does not con- 
tain unary cycles, i.e., there is no nonterminal A 
such that A 



*+ A 



2.1 The selective left-corner transform 

The selective left-corner transform takes as input a 
CFG G — (V, T, P, S) and a set of left-corner produc- 
tions L C P, which contains no epsilon productions; 
the non-left-corner productions P — L are called top- 
down productions. The standard left- corner trans- 
form is obtained by setting L to the set of all 
non-epsilon productions in P. The selective left- 
corner transform of G with respect to L is the CFG 
CC L {G) = {V!,T,P X ,S), where: 

Vi = V U {D-X : D e V, X G V U T} 

and Pi contains all instances of the schemata E. In 
these schemata, D G V, w G T, and lower case 
greek letters range over (V U T)* . The D-X are 
new nonterminals; informally they encode a parse 
state in which an D is predicted top-down and an X 
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Figure 1: Schematic parse trees generated by the 
original grammar G and the selective left-corner 
transformed grammar CCl{G). The shaded local 
trees in the original parse tree correspond to left- 
corner productions; the corresponding local trees 
(generated by instances of schema [l<]) in the selective 
left-corner transformed tree are also shown shaded. 
The local tree colored black is generated by an in- 
stance of schema lb. 



has been found left-corner, so D-X ■ 



if D 
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D — » w D—w 
D — > a D-A where A 
D-B -> (3 D-C where C 
D-D -> e 



(la) 

aeP-L (lb) 
B(3eL (lc) 
(Id) 



The schemata function as follows. The productions 
introduced by schema [la| start a left-corner parse of 
a predicted nonterminal D with its leftmost termi- 
nal iv, while those introduced by schema [fq start a 
left-corner parse of D with a left-corner Ajwhich is 
itself found by the top-down recognition of produc- 
tion A — > a G P — L. Schema [k| extends the current 
left-corner B up to a C with the left-corner recogni- 
tion of production C — > B (3. Finally, schema Id 
matches the top-down prediction with the recog- 
nized left-corner category. 

Figure [l] schematically depicts the relationship be- 
tween a chain of left-corner productions in a parse 
tree generated by G and the chain of corresponding 
instances of schema 0. The left-corner recognition 
of the chain starts with the recognition of a, the 
right-hand side of a top-down production A — ► a, 
using an instance of schema [Ft]. The left-branching 
chain of left-corner productions corresponds to a 
right-branching chain of instances of schema |l^; the 
left-corner transform in effect converts left recursion 
into right recursion. Notice that the top-down pre- 
dicted category D is passed down this right-recursive 
chain, effectively multiplying each left-corner pro- 
ductions by the possible top-down predicted cate- 
gories. The right recursion terminates with an in- 
stance of schema [it] when the left-corner and top- 
down categories match. 

Figure shows how top-down productions from 




a generalization of a standard stochastic left-corner 
parser that permits productions to be recognized 



Figure 2: The recognition of a top-down production 
A — > a by CCl{G) involves a left-corner category 
A- A, which immediately rewrites to e. One-step e- 
removal applied to CCl(G) produces a grammar in 
which each top-down production A — > a corresponds 
to a production A — > a in the transformed grammar. 



G are recognized using CCl{G). When the se- 
lective left-corner transform is followed by a one- 
step e- removal transform (i.e., composition or partial 
evaluation of schema lb with respect to schema Id 



( Johnson, 1998a ; Abney and Johnson, 1991; Resnik _ 
1992) )) , each top-down production from G appears 
unchanged in the final grammar. Full e-removal 
yields the grammar given by the schemata below. 



+ 



w 



D — > w D~w 

D — > w where D 

D — > a D-A where A — > a E P - L 

D -> a where D=> L A,A^>aeP- 

D-B -v /3 D-C where C B [5 e L 

D-B -> /3 where D C,C -> B [3 e L 



L 



Moore (|2000| ) introduces a version of the left- 
corner transform called LCl#, which applies only to 
productions with left-recursive parent and left child 
categories. In the context of the other transforms 
that Moore introduces, it seems to have the same 
effect in his system as the selective left-corner trans- 
form does here. 

2.2 Selective left-corner tree transforms 

There is a 1-to-l correspondence between the parse 
trees generated by G and CCl(G). A tree t is gener- 
ated by G iff there is a corresponding t' generated by 
CCl(G), where each occurrence of a top-down pro- 
duction in the derivation of t corresponds to exactly 
one local tree generated by occurrence of the cor- 
responding instance of schema lb in the derivation 
of t', and each occurrence of a left-corner produc- 
tion in t corresponds to exactly one occurrence of 
the corresponding instance of schema [k] in t' . It is 
straightforward to define a 1-to-l tree transform Tl 
mapping parse trees of G into parse trees of CC l{G) 
( Johnson, 1998a ; Roark and Johnson, 1999 ). In the 
empirical evaluation below, we estimate a PCFG 
from the trees obtained by applying Tl to the trees 
in the Penn WSJ tree-bank, and compare it to the 
PCFG estimated from the original tree-bank trees. 
A stochastic top-down parser using the PCFG es- 
timated from the trees produced by Tl simulates 
a stochastic generalized left-corner parser, which is 



top-down as well as left-corner (Manning and Car 



penter, 1997). Thus investigating the properties of 



PCFG estimated from trees transformed with Tl is 
an easy way of studying stochastic push-down au- 
tomata performing generalized left-corner parses. 

2.3 Pruning useless productions 

We turn now to the problem of reducing the size of 
the grammars produced by left-corner transforms. 
Many of the productions generated by schemata |l| 
are useless, i.e., they never appear in any termi- 
nating derivation. While they can be removed by 
s tandard methods for deletin g useless productions 
( Hopcroft and Ullman, 1979 ), the relationship be- 
tween the parse trees of G and CCl(G) depicted in 
Figure ^ shows how to determine ahead of time the 
new nonterminals D-X that can appear in useful 
productions of CCl(G). This is known as a link con- 
straint. 

For (P)CFGs there is a particularly simple link 
constraint: D-X appears in useful productions of 
CC L {G) only if 3 7 € {VliT)*.D Xj. If 

epsilon removal is applied to the resulting gram- 
mar, D-X appears in useful productions only if 



37 e (V U T)+.D 



Xj. Thus one only need 



generate instances of the left-corner schemata which 
satisfy th e corr esponding link constraints. 

Moore (pOOOj ) suggests an additional constraint on 
nonterminals D-X that can appear in useful produc- 
tions of CCl (G) : D must either be the start symbol 
of G or else appear in a production A — > aD{3 of G, 
for any A G V, a £ {V U T}+ and j3 6 {V U T}*. 
It is easy to see that the productions that Moore's 
constraint prohibits are useless. There is one non- 
terminal in the tree-bank grammar investigated be- 
low that has this property, namely LST. However, 
in the tree-bank grammar none of the productions 
expanding LST are left-recursive (in fact, the first 
child is always a preterminal), so Moore's constraint 
does not affect the size of the transformed grammars 
investigated below. 

While these constraints can dramatically reduce 
both the number of productions and the size of the 
parsing search space of the transformed grammar, 
in general the transformed grammar CCl{G) can be 
quadratically larger than G. There are two causes 
for the explosion in grammar size. First, CCl(G) 
contains an instance of schema [H for each top- 
down production A — > a and each D such that 
3-f.D Aj. Second, CCl{G) contains an in- 

stance of schema [k] for each left-corner production 
C — > (3 and each D such that 3j. D C-f. In 
effect, CCl(G) contains one copy of each production 
for each possible left-corner ancestor. Section 2^ 
describes further factorizations of the productions 
of CCl{G) which mitigate these causes. 



2.4 Optimal choice of L 



H, and |j| 



Because 



increases monotonically with and 



hence L, we typically reduce the size of CCl(G) by 
making the left-corner production set L as small as 
possible. This section shows how to find the unique 
minimal set of left-corner productions L such that 
£Cl(G) is not left-recursive. 

Assume G = (V,T,P,S) is pruned (i.e., P con- 
tains no useless productions) and that there is no 
A e V such that A — >g A (i.e., G does not gen- 
erate recursive unary branching chains). For rea- 
sons of space we also assume that P contains no 
e-productions, but this approach can be extended to 
deal with them if desired. A production A — > Bp G 
P is left-recursive iff 37 G (V U T)*. £? =^ A7, i.e., 
P rewrites B into a string beginning with A. Let Lo 
be the set of left-recursive productions in G. Then 
we claim (1) that CCl {G) is not left-recursive, and 
(2) that for all L C Lq, CCl(G) is left-recursive. 

Claim 1 follows from the fact that if A =>l By 
then A By and the constraints in section |2.3| 



D 
A' 



A 1 D-A 



where A 
where A 



ae P - L (3a) 
aeP-L (3b) 



Notice that the number of instances of schema |3a| is 
less than the square of the number of nonterminals 
and that the number of instances of schema [Si] is the 
number of top-down productions; the sum of these 
numbers is usually much less than the number of 
instances of schema [j^. 

Top-down factoring plays approximately the same 
role as "non -left-r ecursion grouping" (NLRG) does 
in Moore 's (|2000D approach. The major difference 
is that NLRG applies to all productions A — > B[3 
in which B is not left-recursive, i.e., ~fiy.B =>p By, 
while in our system top-down factorization applies to 
those productions for which -fiy.B =>p Ay, i.e., the 
productions not directly involved in left recursion. 

The left-corner factorization decomposes 
schema ^ in a similar way using new nonter- 
minals D\X, where D <E V and X e V U T. 



on useful productions of CCl (G). Claim 2 follows £C^ lc \G) = (Vi T Pi S) where' 



from the fact that if L C Lq then there is a chain of 
left-recursive productions that includes a top-down 
production; a simple induction on the length of the 
chain shows that CCl(G) is left-recursive. 

This result justifies the common practice in natu- 
ral language left-corner parsing of taking the termi- 
nals to be the preterminal part-of-speech tags, rather 
than the lexical items themselves. (We did not at- 
tempt to calculate the size of such a left-corner gram- 
mar in the empirical evaluation below, but it would 
be much larger than any of the grammars described 
there). In fact, if the preterminals are distinct from 
the other nonterminals (as they are in the tree-bank 
grammars investigated below) then Lq does not in- 
clude any productions beginning with a preterminal, 
and CCl (G) contains no instances of schema |la| at 
all. We now turn our attention to the other schemata 
of the selective left-corner grammar transform. 

2.5 Factoring the output of CCl 

This section defines two factorizations of the output 
of the selective left-corner grammar transform that 
can dramatically reduce its size. These factoriza- 
tions are most effective if the number of productions 
is much larger than the number of nonterminals, as 
is usually the case with tree-bank grammars. 

The top-down factorization decomposes 
schema 



Vic 



Vi U {D\X : D G V, X G V U T} 



and Pic contains all instances of the schemata [Ta|, 
fb[ |t] and y. 

Bf3eL (4a) 
Bf3 EL (4b) 



D-B -> C\B D-C where C 
C\B -► (3 where G 



The number of instances of schema |4a| is bounded 
by the number of instances of schematic] and is typ- 
ically much smaller, while the number of instances 
of schema 4b is precisely the number of left-corner 



lb 



by introducing new nonterminals 
D', where D G V, that have the same expansions 
that D does in G. Using the same interpretation for 
variables as in schemata |, if G = (V, T, P, S) then 
CC { l d) (G) = (V td ,T,P td ,S), where: 

Vtd = Vl U {£>' : D e V} 

and P t d contains all instances of the schemata [la| , 



productions L. 

Left-corner fa ctorin g seems to correspond to one 
step of Moore's Q2000| ) "left factor" (LF) operation. 
The left factor operation constructs new nontermi- 
nals corresponding to common prefixes of arbitrary 
length, while left-corner factoring effectively only 
factors the first nonterminal symbol on the right 
hand side of left-corner productions. While we have 
not done experiments, Moore's left factor operation 
would seem to reduce the total number of symbols 
in the transformed grammar at the expense of pos- 
sibly introducing additional productions, while our 
left-corner factoring reduces the number of produc- 
tions. 

These two factorizations can be used together 
in the obvious way to define a grammar trans- 
form CC ^j d ' l c \ whose productions are defined by 



schemata |la|, |3a , 3b, 4a|, 4b and Id. There are corre- 
sponding tree transforms, which we refer to as T^ d \ 
etc., below. Of c ours e, the pruning constraints de- 
scribed in section 2^ are applicable with these fac- 
torizations, and corresponding invertible tree trans- 
forms can be constructed. 



3 Empirical Results 

To examine the effect of the transforms outlined 
above, we experimented with various PCFGs in- 
duced from sections 2-21 of a modifie d Penn WSJ 
tree-bank as described in Johnson ( 1998b ) (i.e., 



labels simplified to grammatical categories, ROOT 
nodes added, empty nodes and vacuous unary 
branches deleted, and auxiliaries retagged as Aux 
or AUXG). We ignored lexical items, and treated 
the part-of-speech tags as terminals. As Bob Moore 
pointed out to us, the left-corner transform may pro- 
duce left-recursive grammars if its input grammar 
contains unary cycles, so we removed them using the 
a transform that Moore suggested. Given an initial 
set of (non-epsilon) productions P, the transformed 
grammar contains the following productions, where 
the are new non-terminals: 



A — > a where A 
A—fD^ where A 
A^ — > a where A 



ae P,A^>+ A 
D =>t A 

A, a i>* P A 



> a G P, A 



This transform can be extended to one on PCFGs 
which preserves derivation probabilities. In this sec- 
tion, we fix P to be the productions that result after 
applying this unary cycle removal transformation to 
the tree- bank productions, and G to be the corre- 
sponding grammar. 

Tables | and | give the sizes of selective left- 
corner grammar transforms of G for various values 
of the left-corner set L and factorizations, without 
and with epsilon-removal respectively. In the ta- 
bles, L is the set of left-recursive productions in 
P, as defined in section |2.4| . N is the set of produc- 
tions in P whose left-hand sides do not begin with 
a part-of-speech (POS) tag; because POS tags are 
distinct from other nonterminals in the tree-bank, 
N is an easily identified set of productions guaran- 
teed to include Lq. The tables also gives the sizes 
of maximum-likelihood PCFGs estimated from the 
trees resulting from applying the selective left-corner 
tree transforms T to the tree-bank, breaking unary 
cycles as described above. For the parsing experi- 
ments below we always deleted empty nodes in the 
output of these tree transforms; this corresponds to 
epsilon removal in the grammar transform. 

First, note that CCp(G), the result of applying the 
standard left-corner grammar transform to G, has 
approximately 20 times the number of productions 
that G has. 



However CC^ d ' lc \G), the result oi ap- 



plying trie selective lett-corncr grammar transtorma- 
tion with factorization, has approximately 1.4 times 
the number of productions that G has. Thus the 
methods described in this paper can in fact dramati- 
cally reduce the size of left-corner transformed gram- 
mars. Second, note that CC N d ' lc \G) is not much 
larger than CC^' 1c \g). This is because N is not 
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CC Lo 
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T P 
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17,146 
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19,002 
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18,945 


16,126 


18,437 


15,618 



Table 1: Sizes of PCFGs inferred using various 
grammar and tree transforms after pruning with link 
constraints without epsilon removal. Columns indi- 
cate factorization. In the grammar and tree trans- 
forms, P is the set of productions in G (i.e., the 
standard left-corner transform), N is the set of all 
productions in P which do not begin with a POS 
tag, and L is the set of left-recursive productions. 
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25,335 


CC Lo 


505,435 


157,899 


371,102 


23,566 


Tp 


22,035 




17,398 




T N 


21,589 


16,688 


20,696 


15,795 


?l 


21,061 


16,566 


20,168 


15,673 



Table 2: Sizes of PCFGs inferred using various 
grammar and tree transforms after pruning with link 
constraints with epsilon removal, using the same no- 
tation as Table [|. 

much larger than Lq , which in turn is because most 
pairs of non-POS nonterminals A, B are mutually 
left-recursive. 

Turning now to the PCFGs estimated after ap- 
plying tree transforms, we notice that grammar size 
does not increase nearly so dramatically. These 
PCFGs encode a maximum-likelihood estimate of 
the state transition probabilities for various stochas- 
tic generalized left-corner parsers, since a top-down 
parser using these grammars simulates a general- 
ized left-corner parser. The fact that CCp(G) is 
17 times larger than the PCFG inferred after apply- 
ing Tp to the tree-bank means that most of the pos- 
sible transitions of a standard stochastic left-corner 
parser are not observed in the tree-bank training 
data. The state of a left-corner parser does capture 
some linguist i c generalizations ([Manning and Car- 



penter, 1997 ; Roark and Johnson, 1999| ), but one 
might still expect sparse-data problems. Note that 
£g(td,ic) . g Qm y 14 times larger than ■j^* d >' c ) j so we 
expect less serious sparse data problems with the 
factored selective left-corner transform. 

We quantify these sparse data problems in two 
ways using a held-out test corpus, viz., all sentences 
in section 23 of the tree-bank. First, table || lists the 
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Table 3: The number of sentences in section 23 
that do not receive a parse using various grammars 
estimated from sections 2-21. 
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Table 4: The number of productions found in the 
transformed trees of sentences in section 23 that do 
not appear in the corresponding transformed trees 
from sections 2-21. (The subscript epsilon indicates 
epsilon removal was applied). 

number of sentences in the test corpus that fail to 
receive a parse with the various PCFGs mentioned 
above. This is a relatively crude measure, but cor- 
relates roughly with the ratios of grammar sizes, as 
expected. 

Second, table ^ lists the number of productions 
found in the Iree- transformed lesL corpus lhal do 

not appear in the correspondingly transformed trees 
of sections 2-21. What is striking here is that the 
number of missing productions after either of the 
transforms 7^* rf '' c ) or q-(^ d - lc ) j g approximately the 
same as the number of missing productions using 
the untransformed trees, indicating that the factored 
selective left-corner transforms cause little or no ad- 
ditional sparse data problem. (The relationship be- 
tween local trees in the parse trees of G and CCl(G) 
mentioned earlier implies that left-corner tree trans- 
formations will not decrease the number of missing 
productions). 

We also investigate the accuracy of the maximum- 
likelihood parses (MLPs) obtained using the PCFGs 
estimated from the output of the various left-corner 
tree transforms.^] We searched for these parses us- 
ing an exhaustive CKY parser. Because the parse 
trees of these PCFGs are isomorphic to the deriva- 
tions of the corresponding stochastic generalized 
left-corner parsers, we are in fact evaluating different 



Table 5: Labelled recall and precision scores of 
PCFGs estimated using various tree-transforms in 
a transform-detransform framework using test data 
from section 23. 

kinds of stochastic generalized left-corner parsers in- 
ferred from sections 2-21 of the tree-bank. We used 
the trans form-d etransform framework described in 
Johnson ( |l998b[ ) to evaluate the parses, i.e., we ap- 
plied the appropriate inverse tree transform T _1 
to detransform the parse trees produced using the 
PCFG estimated from trees transformed by T. By 
calculating the labelled precision and recall scores 
for the detransformed trees in the usual manner, we 
can systematically compare the parsing accuracy of 
different kinds of stochastic generalized left-corner 
parsers. 

Table [5] presents the results of this comparison. As 
reported previously, the standard left-corner gram- 
mar embeds sufficient non-local information in its 
productions to significantly improve the labelled pre- 
cision and recall of its MLPs with respect to MLPs of 
the PCFG estimated from the untransformed trees 



( Manning and Carpenter, 1997 ; Roark and John 



son, 1999|). Parsing accuracy drops off as grammar 



size decreases, presumably because smaller PCFGs 
have fewer adjustable parameters with which to de- 
scribe this non-local information. There are other 
kinds of non-local information which can be incor- 
porated into a PCFG using a transform-detransform 
approach that result in an even greater improvement 
of parsing accuracy (Johnson, 1998b). Ultimately 
however, it seems that a more complex approach in- 
corporating back-off and smoothing is necessary in 
order to a chiev e the parsing accur acy achieved by 
Charniak ^M) and Collins (|l997|). 



1 We did not investigate the grammars produced by the 
various left-corner grammar transforms. Because a left-corner 
grammar transform CCl preserves production probabilities, 
the highest scoring parses obtained using the weighted CFG 
CCl(G) should be the highest scoring parses obtained using 
G transformed by Tl ■ 



4 Conclusion 

This paper presented factored selective left-corner 
grammar transforms. These transforms preserve the 
primary benefits of the left-corner grammar trans- 
form (i.e., elimination of left-recursion and preserva- 
tion of annotations on productions) while dramati- 
cally ameliorating its principal problems (grammar 
size and sparse data problems). This should extend 
the applicability of left-corner techniques to situa- 
tions involving large grammars. We showed how to 
identify the minimal set Lq of productions of a gram- 
mar that must be recognized left-corner in order for 
the transformed grammar not to be left-recursive. 
We also proposed two factorizations of the output of 



the selective left-corner grammar transform which 
further reduce grammar size, and showed that there 
is only a minor increase in grammar size when the 
factored selective left-corner transform is applied to 
a large tree-bank grammar. Finally, we exploited 
the tree transforms that correspond to these gram- 
mar transforms to formulate and study a class of 
stochastic generalized left-corner parsers. 

This work could be extended in a number of ways. 
For example, in this paper we assumed that one 
would always choose a left-corner production set 
that includes the minimal set L required to ensure 
that the transformed grammar i s not left-recursive. 
However, Roark and Johnson ( |1999| ) report good 
performance from a stochastically-guided top-down 
parser, suggesting that left-recursion is not always 
fatal. It might be possible to judiciously choose 
a left-corner production set smaller than Lq which 
eliminates pernicious left-recursion, so that the re- 
maining left-recursive cycles have such low proba- 
bility that they will effectively never be used and 
a stochastically-guided top-down parser will never 
search them. 
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